How to Calculate McNemar’s Test to Compare Two Machine Learning Classifiers

Statistical Hypothesis Tests for Deep Learning

Contingency Table（分割表）

McNemar’s Test Statistic

統計量：(Yes/No - No/Yes)^2 / (Yes/No + No/Yes)

機械学習モデルではclassifier1のcorrect/wrongとclassifier2のcorrect/wrongに置き換えられそう

Given the selection of a significance level, the p-value calculated by the test can be interpreted as follows:

p > alpha: fail to reject H0, no difference in the disagreement (e.g. treatment had no effect).

p <= alpha: reject H0, significant difference in the disagreement (e.g. treatment had an effect).

Interpret the McNemar’s Test for Classifiers

1. No Measure of Training Set or Model Variability

2. Less Direct Comparison of Models

McNemar’s Test in Python

statsmodelを使った実装